12 research outputs found
Capacitated k-Center Problem with Vertex Weights
We study the capacitated k-center problem with vertex weights. It is a generalization of the well known k-center problem. In this variant each vertex has a weight and a capacity. The assignment cost of a vertex to a center is given by the product of the weight of the vertex and its distance to
the center. The distances are assumed to form a metric. Each center can only serve as many vertices as its capacity. We show an n^{1-epsilon}-approximation hardness for this problem, for any epsilon > 0, where n is the number of vertices in the input. Both the capacitated and the weighted versions of the k-center problem individually can be approximated within a constant factor. Yet the common extension of both the generalizations cannot be approximated efficiently within a constant factor, unless P = NP. This problem, to the best of our knowledge, is the first facility location problem with metric distances known to have a super-constant inapproximability result. The hardness result easily generalizes to versions of the problem that consider the p-norm of the assignment costs (weighted distances) as the objective function. We give n^{1- 1/p - epsilon}-approximation hardness for this problem, for p>1.
We complement the hardness result by showing a simple n-approximation algorithm for this problem. We also give a bi-criteria constant factor approximation algorithm, for the case of uniform capacities, which opens at most 2k centers
Policy Smoothing for Provably Robust Reinforcement Learning
The study of provable adversarial robustness for deep neural networks (DNNs)
has mainly focused on static supervised learning tasks such as image
classification. However, DNNs have been used extensively in real-world adaptive
tasks such as reinforcement learning (RL), making such systems vulnerable to
adversarial attacks as well. Prior works in provable robustness in RL seek to
certify the behaviour of the victim policy at every time-step against a
non-adaptive adversary using methods developed for the static setting. But in
the real world, an RL adversary can infer the defense strategy used by the
victim agent by observing the states, actions, etc. from previous time-steps
and adapt itself to produce stronger attacks in future steps. We present an
efficient procedure, designed specifically to defend against an adaptive RL
adversary, that can directly certify the total reward without requiring the
policy to be robust at each time-step. Our main theoretical contribution is to
prove an adaptive version of the Neyman-Pearson Lemma -- a key lemma for
smoothing-based certificates -- where the adversarial perturbation at a
particular time can be a stochastic function of current and previous
observations and states as well as previous actions. Building on this result,
we propose policy smoothing where the agent adds a Gaussian noise to its
observation at each time-step before passing it through the policy function.
Our robustness certificates guarantee that the final total reward obtained by
policy smoothing remains above a certain threshold, even though the actions at
intermediate time-steps may change under the attack. Our experiments on various
environments like Cartpole, Pong, Freeway and Mountain Car show that our method
can yield meaningful robustness guarantees in practice
On the Cost of Essentially Fair Clusterings
Clustering is a fundamental tool in data mining. It partitions points into
groups (clusters) and may be used to make decisions for each point based on its
group. However, this process may harm protected (minority) classes if the
clustering algorithm does not adequately represent them in desirable clusters
-- especially if the data is already biased.
At NIPS 2017, Chierichetti et al. proposed a model for fair clustering
requiring the representation in each cluster to (approximately) preserve the
global fraction of each protected class. Restricting to two protected classes,
they developed both a 4-approximation for the fair -center problem and a
-approximation for the fair -median problem, where is a parameter
for the fairness model. For multiple protected classes, the best known result
is a 14-approximation for fair -center.
We extend and improve the known results. Firstly, we give a 5-approximation
for the fair -center problem with multiple protected classes. Secondly, we
propose a relaxed fairness notion under which we can give bicriteria
constant-factor approximations for all of the classical clustering objectives
-center, -supplier, -median, -means and facility location. The
latter approximations are achieved by a framework that takes an arbitrary
existing unfair (integral) solution and a fair (fractional) LP solution and
combines them into an essentially fair clustering with a weakly supervised
rounding scheme. In this way, a fair clustering can be established belatedly,
in a situation where the centers are already fixed
Provable Robustness for Streaming Models with a Sliding Window
The literature on provable robustness in machine learning has primarily
focused on static prediction problems, such as image classification, in which
input samples are assumed to be independent and model performance is measured
as an expectation over the input distribution. Robustness certificates are
derived for individual input instances with the assumption that the model is
evaluated on each instance separately. However, in many deep learning
applications such as online content recommendation and stock market analysis,
models use historical data to make predictions. Robustness certificates based
on the assumption of independent input samples are not directly applicable in
such scenarios. In this work, we focus on the provable robustness of machine
learning models in the context of data streams, where inputs are presented as a
sequence of potentially correlated items. We derive robustness certificates for
models that use a fixed-size sliding window over the input stream. Our
guarantees hold for the average model performance across the entire stream and
are independent of stream size, making them suitable for large data streams. We
perform experiments on speech detection and human activity recognition tasks
and show that our certificates can produce meaningful performance guarantees
against adversarial perturbations
Certifying LLM Safety against Adversarial Prompting
Large language models (LLMs) released for public use incorporate guardrails
to ensure their output is safe, often referred to as "model alignment." An
aligned language model should decline a user's request to produce harmful
content. However, such safety measures are vulnerable to adversarial prompts,
which contain maliciously designed token sequences to circumvent the model's
safety guards and cause it to produce harmful content. In this work, we
introduce erase-and-check, the first framework to defend against adversarial
prompts with verifiable safety guarantees. We erase tokens individually and
inspect the resulting subsequences using a safety filter. Our procedure labels
the input prompt as harmful if any subsequences or the input prompt are
detected as harmful by the filter. This guarantees that any adversarial
modification of a harmful prompt up to a certain size is also labeled harmful.
We defend against three attack modes: i) adversarial suffix, which appends an
adversarial sequence at the end of the prompt; ii) adversarial insertion, where
the adversarial sequence is inserted anywhere in the middle of the prompt; and
iii) adversarial infusion, where adversarial tokens are inserted at arbitrary
positions in the prompt, not necessarily as a contiguous block. Empirical
results demonstrate that our technique obtains strong certified safety
guarantees on harmful prompts while maintaining good performance on safe
prompts. For example, against adversarial suffixes of length 20, it certifiably
detects 93% of the harmful prompts and labels 94% of the safe prompts as safe
using the open source language model Llama 2 as the safety filter
Can AI-Generated Text be Reliably Detected?
In this paper, both empirically and theoretically, we show that several
AI-text detectors are not reliable in practical scenarios. Empirically, we show
that paraphrasing attacks, where a light paraphraser is applied on top of a
large language model (LLM), can break a whole range of detectors, including
ones using watermarking schemes as well as neural network-based detectors and
zero-shot classifiers. Our experiments demonstrate that retrieval-based
detectors, designed to evade paraphrasing attacks, are still vulnerable to
recursive paraphrasing. We then provide a theoretical impossibility result
indicating that as language models become more sophisticated and better at
emulating human text, the performance of even the best-possible detector
decreases. For a sufficiently advanced language model seeking to imitate human
text, even the best-possible detector may only perform marginally better than a
random classifier. Our result is general enough to capture specific scenarios
such as particular writing styles, clever prompt design, or text paraphrasing.
We also extend the impossibility result to include the case where pseudorandom
number generators are used for AI-text generation instead of true randomness.
We show that the same result holds with a negligible correction term for all
polynomial-time computable detectors. Finally, we show that even LLMs protected
by watermarking schemes can be vulnerable against spoofing attacks where
adversarial humans can infer hidden LLM text signatures and add them to
human-generated text to be detected as text generated by the LLMs, potentially
causing reputational damage to their developers. We believe these results can
open an honest conversation in the community regarding the ethical and reliable
use of AI-generated text